NLP Applications of Sinhala: TTS & OCR

نویسندگان

  • Ruvan Weerasinghe
  • Asanka Wasala
  • Dulip Lakmal Herath
  • Viraj Welgama
چکیده

This paper brings together the practical applications and the evaluation of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and an Optical Character Recognition system for Sinhala.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Festival-si: A Sinhala Text-to-Speech System

This paper brings together the development of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting Letter-to-Sound rules in...

متن کامل

Recognition of Printed Sinhala Characters Using Linear Symmetry

Sinhala characters used in the Sinhala script by over 70% of the 18 million population in Sri Lanka, have been descended from the ancient Brahmi script. The Sinhala alphabet consists of vowels and consonants and the consonants are modified using modifier symbols to give the required vocal sounds. In the process of developing an OCR for the Sinhala script, characters are initially recognised thr...

متن کامل

Text-to-Speech Synthesis for Mandarin Chinese

A Text-To-Speech (TTS) synthesizer is a computer-based system that is able to automatically read text aloud, regardless whether the text is introduced by computer input stream or a scanned input that is submitted to an optical character recognition (OCR) engine. TTS synthesis can be used in many areas, such as telecommunication services, language education, vocal monitoring, multimedia, and as ...

متن کامل

Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script

The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabi...

متن کامل

A Generative Probabilistic OCR Model for NLP Applications

In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008